首页> 外文OA文献 >Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

【2h】

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

机译：使用策略语言偏差进行近似策略迭代：求解关系马尔可夫决策过程

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study an approach to policy selection for large relational Markov DecisionProcesses (MDPs). We consider a variant of approximate policy iteration (API)that replaces the usual value-function learning step with a learning step inpolicy space. This is advantageous in domains where good policies are easier torepresent and learn than the corresponding value functions, which is often thecase for the relational MDPs we are interested in. In order to apply API tosuch problems, we introduce a relational policy language and correspondinglearner. In addition, we introduce a new bootstrapping routine for goal-basedplanning domains, based on random walks. Such bootstrapping is necessary formany large relational MDPs, where reward is extremely sparse, as API isineffective in such domains when initialized with an uninformed policy. Ourexperiments show that the resulting system is able to find good policies for anumber of classical planning domains and their stochastic variants by solvingthem as extremely large relational MDPs. The experiments also point to somelimitations of our approach, suggesting future work.

机译：我们研究了一种用于大型关系马尔可夫决策过程（MDP）的策略选择方法。我们考虑一种近似策略迭代（API）的变体，该变体将常规的价值函数学习步骤替换为学习步骤策略空间。这在良好策略比相应的值函数更易于表示和学习的领域中是有利的，这对于我们感兴趣的关系MDP通常是这样。为了将API应用到此类问题，我们引入了关系策略语言和相应的学习器。此外，我们基于随机游走引入了一种新的自举程序，用于基于目标的计划域。这种引导对于任何大型的关系型MDP都是必要的，在这种关系型MDP中，奖励极其稀疏，因为在使用不明智的策略进行初始化时，API在此类域中无效。我们的实验表明，通过将解决方案解决为极大的关系型MDP，所得的系统能够为许多经典计划领域及其随机变量找到良好的策略。实验还指出了我们方法的局限性，暗示了未来的工作。

著录项

作者
Fern, A.; Givan, R.; Yoon, S.;
展开▼
作者单位

展开▼
年度 2011
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . Fern A., Givan R., Yoon S. The Journal of Artificial Intelligence Research . 2006,第12期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
2. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . A. Fern S. Yoon, R. Givan Journal of Automation, Mobile Robotics & Intelligent Systems . 2006,第5期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
3. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . Alan Fern, Sungwook Yoon, Robert Givan The Journal of Artificial Intelligence Research . 2006,第0期

机译：具有策略语言偏差的近似策略迭代：解决关系Markov决策过程
4. Approximate Policy Iteration with a Policy Language Bias [C] . Alan Fern, SungWook Yoon, Robert Givan Annual Conference on Neural Information Processing Systems . 2004

机译：具有策略语言偏见的近似政策迭代
5. Efficient approximate policy iteration methods for sequential decision making in reinforcement learning. [D] . Lagoudakis, Michail G. 2003

机译：强化学习中顺序决策的有效近似策略迭代方法。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm [O] . Rachelson Emmanuel, Fabiani Patrick, Garcia Frédérick 2008

机译：广义半马尔可夫决策过程的近似策略迭代：改进算法
8. Evolutionary Policy Iteration for Solving Markov Decision Processes [R] . Chang, H. S. , Lee, H. , Fu, M. , 2002

机译：求解马尔可夫决策过程的进化策略迭代

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅